Text Alignment for Real-Time Crowd Captioning
نویسندگان
چکیده
The primary way of providing real-time captioning for deaf and hard of hearing people is to employ expensive professional stenographers who can type as fast as natural speaking rates. Recent work has shown that a feasible alternative is to combine the partial captions of ordinary typists, each of whom types part of what they hear. In this paper, we describe an improved method for combining partial captions into a final output based on weighted A search and multiple sequence alignment (MSA). In contrast to prior work, our method allows the tradeoff between accuracy and speed to be tuned, and provides formal error bounds. Our method outperforms the current state-of-the-art on Word Error Rate (WER) (29.6%), BLEU Score (41.4%), and F-measure (36.9%). The end goal is for these captions to be used by people, and so we also compare how these metrics correlate with the judgments of 50 study participants, which may assist others looking to make further progress on this problem.
منابع مشابه
Sliding Alignment Windows for Real-Time Crowd Captioning
The primary way of providing real-time speech to text captioning for hard of hearing people is to employ expensive professional stenographers who can type as fast as natural speaking rates. Recent work has shown that a feasible alternative is to combine the partial captions of ordinary typists, each of whom is able to type only part of what they hear. In this paper, we extend the state of the a...
متن کاملUsing keyword spotting to help humans correct captioning faster
Automatic real-time captioning provides immediate and on demand access to spoken content in lectures or talks, and is a crucial accommodation for deaf and hard of hearing (DHH) people. However, in the presence of specialized content, like in technical talks, automatic speech recognition (ASR) still makes mistakes which may render the output incomprehensible. In this paper, we introduce a new ap...
متن کاملToward Scalable Social Alt Text: Conversational Crowdsourcing as a Tool for Refining Vision-to-Language Technology for the Blind
The access of visually impaired users to imagery in social media is constrained by the availability of suitable alt text. It is unknown how imperfections in emerging tools for automatic caption generation may help or hinder blind users’ understanding of social media posts with embedded imagery. In this paper, we study how crowdsourcing can be used both for evaluating the value provided by exist...
متن کاملBroadcast Technology
Closed captioning to convey the speech of TV programs by text is becoming a useful means of providing information for elderly people and the hearing impaired, and real-time captioning of live programs is expanding yearly thanks to the use of speech recognition technology and special keyboards for high-speed input. This paper describes the current state of closed captioning, provides an overview...
متن کاملLinefeed Insertion into Japanese Spoken Monologue for Captioning
To support the real-time understanding of spoken monologue such as lectures and commentaries, the development of a captioning system is required. In monologues, since a sentence tends to be long, each sentence is often displayed in multi lines on one screen, it is necessary to insert linefeeds into a text so that the text becomes easy to read. This paper proposes a technique for inserting linef...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013